Efficient Record De-Duplication Identifying Using Febrl Framework
نویسندگان
چکیده
منابع مشابه
An Efficient way of Record Linkage System and Deduplication using Indexing techniques, Classification and FEBRL Framework
Record linkage is an important process in data integration, which is used in merging, matching and duplicate removal from several databases that refer to the same entities. Deduplication is the process of removing duplicate records in a single database. In recent years, data cleaning and standardization becomes an important process in data mining task. Due to complexity of today’s database, fin...
متن کاملA Bayesian Approach to Graphical Record Linkage and De-duplication
We propose an unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files. Our key innovation involves the representation of the pattern of links between records as a bipartite graph, in which records are directly linked to latent true individuals, and only indirectly linked to other records. This flexible representation...
متن کاملViDeDup: An Application-Aware Framework for Video De-duplication
Key to the compression-capability of a data deduplication system is the definition of redundancy. Traditionally, two data items are considered redundant if their underlying bit-streams are identical. However, this notion of redundancy is too strict for many applications. For example, for a video storage platform, two videos encoded in different formats would be unique at the system level but re...
متن کاملAn Efficient Algorithm for De-duplication of Demographic Data
This paper proposes an efficient algorithm to de-duplicate based on demographic information which contains two name strings, viz. GivenName and Surname, of individuals. The algorithm consists of two stagesenrolment and de-duplication. In both stages, all name strings are reduced to generic name strings with the help of phonetic based reduction rules. Thus there may be several name strings havin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IOSR Journal of Computer Engineering
سال: 2013
ISSN: 2278-8727,2278-0661
DOI: 10.9790/0661-01022227